Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GC bug fix for max_collect_interval computation #45727

Merged
merged 1 commit into from
Jun 21, 2022
Merged

GC bug fix for max_collect_interval computation #45727

merged 1 commit into from
Jun 21, 2022

Conversation

kpamnany
Copy link
Contributor

Currently, max_collect_interval is constrained to totalmem / ncores / 2 for _P64 which results in a very short collect interval when you're running with a smaller number of threads on a machine with many cores.

Change this to totalmem / nthreads / 2 which, for two of our tests, resulted in 40% and 60% runtime reduction (!!) as well as
GC time reduction from 46% to 10% and 64% to 11%.

Spotted by @janrous-rai.

@kpamnany kpamnany added the GC Garbage collector label Jun 17, 2022
@KristofferC KristofferC added the backport 1.8 Change should be backported to release-1.8 label Jun 17, 2022
Copy link
Member

@Sacha0 Sacha0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Kiran! :)

@janrous-rai
Copy link

janrous-rai commented Jun 17, 2022 via email

@janrous-rai
Copy link

janrous-rai commented Jun 17, 2022 via email

@kpamnany
Copy link
Contributor Author

It's done after GC setup by default.

Not sure exactly when it changed, but GC initialization happens after threads are initialized.

@kpamnany
Copy link
Contributor Author

The tester_linux32 error is an OutOfMemoryError. Retrying now, but might this be being caused by this PR? @oscardssmith?

Currently constrained to `totalmem / ncores / 2` for `_P64` which
results in a very short collect interval when you're running with
a smaller number of threads on a machine with many cores.

Changes this to `totalmem / nthreads / 2` which, for two of our
tests, resulted in 40% and 60% runtime reduction (!!) as well as
GC time reduction from 46% to 10% and 64% to 11%.
@oscardssmith
Copy link
Member

If this PR causes OOMs, something else is broken. @chflood any thoughts?

@gbaraldi
Copy link
Member

32 bit likes to OOM a lot, or more specifically run out of address space so it could be something else.

@kpamnany
Copy link
Contributor Author

tester_linux32 passed on the retry so I think this is good to go.

@oscardssmith oscardssmith merged commit c4c36ed into JuliaLang:master Jun 21, 2022
@kpamnany kpamnany deleted the kp/fix-maxmem branch June 21, 2022 18:32
@KristofferC
Copy link
Member

Trying to run with this backported to 1.8 I get:

Floating point exception (core dumped)

I guess jl_n_threads is not initialized?

cc @kpamnany

@kpamnany
Copy link
Contributor Author

kpamnany commented Jul 4, 2022

I'll check. Should I open a PR to a particular branch?

@KristofferC
Copy link
Member

backports-release-1.8

@KristofferC KristofferC mentioned this pull request Jul 5, 2022
36 tasks
KristofferC pushed a commit that referenced this pull request Jul 5, 2022
* Bug fix for `max_collect_interval` computation (#45727)

Currently constrained to `totalmem / ncores / 2` for `_P64` which
results in a very short collect interval when you're running with
a smaller number of threads on a machine with many cores.

Changes this to `totalmem / nthreads / 2` which, for two of our
tests, resulted in 40% and 60% runtime reduction (!!) as well as
GC time reduction from 46% to 10% and 64% to 11%.

* Move GC init after threading init

To allow use of `jl_n_threads` in GC initialization.
@KristofferC KristofferC removed the backport 1.8 Change should be backported to release-1.8 label Jul 6, 2022
pcjentsch pushed a commit to pcjentsch/julia that referenced this pull request Aug 18, 2022
Currently constrained to `totalmem / ncores / 2` for `_P64` which
results in a very short collect interval when you're running with
a smaller number of threads on a machine with many cores.

Changes this to `totalmem / nthreads / 2` which, for two of our
tests, resulted in 40% and 60% runtime reduction (!!) as well as
GC time reduction from 46% to 10% and 64% to 11%.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
GC Garbage collector
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants